Sentiment analysis of video danmakus based on MIBE-RoBERTa-FF-BiLSTM

Danmakus are user-generated comments that overlay on videos, enabling real-time interactions between viewers and video content. The emotional orientation of danmakus can reflect the attitudes and opinions of viewers on video segments, which can help video platforms optimize video content recommendation and evaluate users’ abnormal emotion levels. Aiming at the problems of low transferability of traditional sentiment analysis methods in the danmaku domain, low accuracy of danmaku text segmentation, poor consistency of sentiment annotation, and insufficient semantic feature extraction, this paper proposes a video danmaku sentiment analysis method based on MIBE-RoBERTa-FF-BiLSTM. This paper constructs a “Bilibili Must-Watch List and Top Video Danmaku Sentiment Dataset” by ourselves, covering 10,000 positive and negative sentiment danmaku texts of 18 themes. A new word recognition algorithm based on mutual information (MI) and branch entropy (BE) is used to discover 2610 irregular network popular new words from trigrams to heptagrams in the dataset, forming a domain lexicon. The Maslow’s hierarchy of needs theory is applied to guide the consistent sentiment annotation. The domain lexicon is integrated into the feature fusion layer of the RoBERTa-FF-BiLSTM model to fully learn the semantic features of word information, character information, and context information of danmaku texts and perform sentiment classification. Comparative experiments on the dataset show that the model proposed in this paper has the best comprehensive performance among the mainstream models for video danmaku text sentiment classification, with an F1 value of 94.06%, and its accuracy and robustness are also better than other models. The limitations of this paper are that the construction of the domain lexicon still requires manual participation and review, the semantic information of danmaku video content and the positive case preference are ignored.


FF-BiLSTM
With the development of social media and video websites, user comments are rapidly increasing in quantity and diversity of forms.Danmaku is a new type of user-generated comment that scrolls in different positions on the video screen 1 , Users communicate with video producers and other users by posting danmakus containing emotions such as praise, sarcasm, ridicule, criticism, and compliments 2,3 .As an emerging information carrier, danmaku contains rich and real semantic information, which is an important corpus for sentiment analysis 4 , and the sentiment analysis of danmakus has important academic and commercial value.In the academic field, the sentiment analysis of danmakus helps to explore the emotional characteristics, expand the research field of sentiment analysis, and enrich the existing research theories and related technologies 5 ; In the commercial field, danmakus sentiment analysis can effectively provide feedback of different users toward the video content, and help video platforms optimize the recommendation of video content and the management strategy of danmakus [6][7][8][9] ; In the field of digital governance, danmakus sentiment analysis can be used to assess the abnormal emotion level of users, providing new methods for the detection of abnormal events on the Internet and the detection of users' mental health 10 .
Sentiment analysis is a specific application of natural language processing, machine learning and other technologies to extract feelings, emotions, opinions and attitudes in text data.Sentiment analysis based on

Lexicon-based danmaku sentiment analysis
Sentiment analysis method for danmaku based on sentiment lexicon, by constructing a sentiment lexicon containing positive and negative sentiment words, segmenting danmakus and matching them with the sentiment lexicon, using an algorithm to classify danmakus and calculating the sentiment values 19 .Zheng et al. 20 used a method based on the semantic weighting of sentiment words to calculate danmakus' sentiment values and categorize danmakus; Wang and Xu 21 combined the universal sentiment lexicons and danmaku multidimensional sentiment lexicons to construct an exclusive sentiment lexicon for danmakus, and then combined the time series to study the change trend of danmakus after calculating the sentiment values; Li et al. 22 combined the lexicon-based danmaku sentiment categorization method with the plain Bayesian method; Liu et al. 23 and Zeng et al. 24 analyzed the influence of danmaku emotions on consumers' purchase intention by classifying danmaku emotions based on an emotion lexicon; Jin et al. 25 fused the improved word forest with the HowNet similarity computation algorithm, and categorized the multidimensional lexicon according to the seven human emotion dimensions, and measured the danmaku emotion values by the improved emotion value computation method.
Sentiment lexicon-based approaches rely too much on the quality and coverage of the sentiment lexicon, with limited scalability and objectivity.The meanings of sentiment words may vary with context and time, increasing the limitations of the lexicon 26 ; In addition, the development of sentiment lexicons and judgment rules requires a great deal of manual design and priori knowledge.The difficulties of sentiment annotation make the quality of the lexicons uneven.The development of social media has led to the continuous emergence of new online terms in danmakus, and the sentiment lexicon is difficult to adapt to the diversity and variability of danmakus timely.Therefore, the effect of danmaku sentiment analysis methods based on sentiment lexicon isn't satisfactory.
www.nature.com/scientificreports/RQ1: How to enable annotators to accurately annotate danmaku sentiment tendencies quickly, simply, reasonably and consistently understood in conjunction with video content?RQ2: How to efficiently discover non-regular popular words in danmaku texts, cut danmaku texts into more reasonable words, and improve the quality of word embeddings in danmaku sentiment classification models?

Machine learning-based danmaku sentiment analysis
A machine learning based approach for danmaku sentiment analysis, preprocessing danmaku data, constructing datasets, selecting and vectorizing text features, and training machine learning models for danmaku sentiment classification.Yang Deng et al. 6 proposed a Multi-Topic Emotion Recognition (MTER) algorithm based on the Hidden Dirichlet Distribution (LDA) model for video clips, which utilizes the implicit emotional dependencies of the words in each danmaku to compute the emotion values and compute the emotion vectors; Jun Xu et al. 27 utilized the Simple Bayes and Maximum Entropy methods to improve the accuracy of comment sentiment classification by selecting semantically inclined words as feature terms, correctly handling negations, and using binary values as feature term weights; Shang et al. 28 used SnowNLP in conjunction with LDA topic modeling for sentiment analysis of multi-class review data; Hu et al. 29 combined stuttering disambiguation and polynomial plain Bayes to construct a classifier for comment sentiment classification.
Machine learning-based methods require a large amount of labeled data and appropriate feature extraction methods, and have higher requirements for classification models; at the same time, this type of methods cannot fully utilize contextual information of the context, which affects the accuracy of classification to a certain extent.
RQ3: How to extract semantic and structural information of danmaku text words in different contexts to effectively capture contextual information?

Deep learning-based danmaku sentiment analysis
Deep learning-based approach for danmaku sentiment analysis by multilayer neural networks.Ye et al. 30 proposed a data collection algorithm based on hotspot detection and a model to analyze danmaku sentiment based on danmaku sentiment lexicon and convolutional neural network; Wang et al. 31 categorized the four emotions of happiness, anger, sadness, and joy through the danmaku sentiment data analysis model based on the BiLSTM model; Bai et al. 32 compared how models such as logistic regression, support vector machines, and recurrent neural networks predict the positive or negative sentiment of danmaku's comments and reflect it in video sentiment curves; Li and Mou 33 used ERNIE and TextCNN to fuse danmaku's textual and temporal features, and then use BILSTM to perform sentiment analysis on the feature-fused vectors; Li et al. 34 constructed a seed sentiment lexicon to compute danmaku text similarity for very short danmaku text sentiment recognition, and borrow BILSTM combined with BERT model for regular text danmaku sentiment recognition.Li et al. 35 used the XLNet model to evaluate the overall sentiment of danmaku comments as pessimistic or optimistic.
Deep learning-based methods have stronger feature learning capabilities, reducing the cost of building and selecting features 36 , but the method needs to be based on a large amount of data, and is prone to data sparsity and overfitting problems in the case of small datasets 37 ; Mainstream text pre-training models use different segmentation methods, BERT and XLNet use WordPiece, RoBERTa uses Byte-Pair Encoding, and the use of Chinese segmentation tools can easily lead to the fact that there is no difference between the Chinese segmented corpus and the pre-segmented one after tokenizer processing, which loses semantic information and results in model performance degradation.
RQ4: How to effectively extract the word information after Chinese word segmentation when the Chinese word segmentation method is inconsistent with the tokenizer method when using the text pre-training model?
Combining the above studies, this paper proposes a danmaku sentiment analysis model based on MIBE-RoBERTa-FF-BiLSTM, a neologism recognition algorithm based on mutual information (MI) and branch entropy (BE) to identify non-regular popular words in danmaku texts, so as to quickly construct a domain lexicon for accurate Chinese word segmentation.At the same time, Maslow's hierarchy of needs theory is applied to guide consistent sentiment annotation, and Roberta's pre-training model, feature fusion layer and Bilstm model are used in combination to adequately extract semantic features of danmaku texts, which effectively improves the ability to analyze the sentiment tendency of danmaku texts.

Ethical approval
This article does not contain any studies with human participants performed by any of the authors.

Model design
This paper proposed a danmaku sentiment analysis method based on MIBE-RoBERTa-FF-BiLSTM, which specifically includes the construction of danmaku domain lexicon based on MIBE neologism recognition algorithm, danmaku text sentiment annotation based on Maslow's hierarchy of needs theory, and RoBERTa-FF-BiLSTM sentiment analysis model.The overall framework of the research methodology is shown in Fig. 1.

Danmaku domain lexicon construction based on MIBE neologism recognition algorithm
Danmaku texts are often highly colloquial, with a large number of non-standard popular words, such as "破 防了", originally a game term referring to the use of skills to break through defenses, but in the context of the Internet, it expresses empathy for sensationalism or exciting videos; "蚌埠住了" is harmonized as "绷不 住了", which describes the impact on the senses and can't help laughing or having an emotional meltdown.www.nature.com/scientificreports/recognizing and manually annotating these neologisms into the lexicon, which in turn improves the accuracy of downstream danmaku sentiment analysis task.This paper proposes a danmaku neologism recognition algorithm based on mutual information and branch entropy.The algorithm incrementally expands candidate words by calculating mutual information with right neighbors, identifying potential neologisms.A screening process using branch entropy eliminates words with smaller left and right neighbor entropy, along with deactivated words at the beginning and end of candidate neologisms.The algorithm automatically creates a danmaku domain lexicon by comparing recognized neologisms www.nature.com/scientificreports/with existing lexical phrases in the corpus.This approach enhances lexicon precision, capturing specific language nuances in danmaku interactions.After comparing the recognized new words with the existing phrases in the corpus, the meaningful new words are filtered out, and the danmaku domain lexicon is formed automatically, and at the same time, the new words are added to the participle lexicon to improve the quality of the participle lexicon.The specific steps are as follows: Mutual Information (MI) is a common method used to measure the degree of co-occurrence of two variables in a corpus, and the larger the value indicates that the degree of dependence and the relationship between the two objects is also stronger.In the process of neologism discovery, it can be counted whether the probability of co-occurrence of two or more characters in the corpus reaches a certain threshold.The calculation method is shown in (1): where p(A) and p(B) denote the probability of word or phrase A and B appearing individually in the corpus set, respectively, p(A,B) denotes the joint probability of A and B co-occurring in the corpus set, and MI(A, B) denotes the degree of dependency between A and B. If MI(A, B) > 0 , the probability of A and B co-occurring is greater than the product of the probability of each of them occurring individually, it means that the two may be related to each other, and the larger the value of MI, it means that the stronger the correlation between the two, and the more likely that they may form a new vocabulary; and if MI(A, B) < 0,it means that A and B are independently distributed in the corpus set.
Branch Entropy (BE) is used to measure whether the neighboring characters of a candidate new word are stable enough, the larger the value indicates that the neighboring characters of the candidate new word contain more information, and the higher the probability of forming a word.The left neighbor entropy, right neighbor entropy are calculated as shown in ( 2) and (3).S l is the set of left neighbors of candidate word W, S r is the set of right neighbors of candidate word W, P(W L |W) denotes the conditional probability that W L is the left neighbor of candidate word W, P(W R |W) denotes the conditional probability that W R is the right neighbor of candidate word W, and the computational equations for P(W L |W) and P(W R |W) are shown in ( 4) and ( 5).
where N(W L , W) denotes the number of times W L and Wco-occur and N(W) denotes the number of times W occurs.Similarly, N(W R , W) denotes the number of times W R and Wappear together.Taking the Internet buzzword "心态崩了" as an example, the prerequisite for it to become a separate word is that the words "心态" and "崩了" co-occur in the corpus at a high frequency, and the randomness of the words distributed around it should be strong enough.
The word-by-word expansion of the uncut danmaku corpus is mainly applied to the recognition of neologisms of three or more characters.Taking the neologism "蚌埠住了" as an example, after the binary neologism "蚌埠" is counted, the mutual information between "蚌埠" and "住" is calculated by shifting to the right and finally expanding to "蚌埠住了".By calculating the mutual information and eliminating the words with low branch entropy and removing the first and last deactivated words, the new word set is obtained after eliminating the existing old words.In addition, this method achieves dynamic evolution of the danmaku lexicon by excluding new words that may contain dummy words at the beginning and end, and adding new words to the lexicon without repetition after comparing them with those in the danmaku lexicon.This approach improves the quality of word splitting and solves the problems of unrecognized new words, repetitions, and garbage strings.In this paper, a total of 9851 neologisms from three to seven dollars were identified by the above method, and after manual checking and reviewing, 2610 neologisms with realistic significance were finally retained to constitute the danmaku neologism lexicon, and Table 1 shows the statistics of some of the neologisms and their manual annotations.

Danmaku emotion annotation based on Maslow's hierarchy of needs theory
The danmaku texts contain internet popular neologisms, which need to be combined with the video content to analyze the potential meanings between the lines, and the emotion annotation is difficult.Currently, it is widely recognized that individuals produce emotions influenced by internal needs and external stimuli, and that when an individual's needs are met, the individual produces positive emotions, otherwise negative emotions are generated 38 .Therefore, this paper decomposes and maps the hierarchy of needs contained in danmaku content, which can be combined with video content to make a more accurate judgment of danmaku emotions.This paper adopts Maslow's hierarchy of needs theory, which includes seven levels of physiological, safety, belonging and love, self-esteem, cognitive, aesthetic, and self-actualization needs, for guiding the labeling of danmaku emotions.
(1) This paper invited 10 senior Bilibili users to watch the video and then use the method to label the sentiment polarity of danmaku text.Compared with the labeling without using the method, the difficulty of the labeling is greatly reduced, and the speed and accuracy of the labeling are significantly improved.Examples of the labeling results are shown in Table 2.
The semantic structure of danmaku text is loosely structured and contains a large number of special characters, such as numbers, meaningless symbols, traditional Chinese characters, or Japanese, etc.These symbols, which contain only a small amount of emotional information, will bring noise to the neural network, so this paper eliminates these redundant information through regular expressions.Meanwhile, this paper visualizes and analyzes the danmaku length, as shown in Fig. 2, and finds that the danmaku length is mainly distributed between 5 and 45 characters, so this paper excludes the danmaku texts whose lengths are more than 100 or less than 5.

RoBERTa-FF-BiLSTM sentiment analysis model
This paper uses the RoBERTa model to pre-train and extract the deep semantic information in danmaku texts, and the corresponding word vectors of the words in the Chinese phrases after word splitting are fused with the features, so that the output word embedding vectors of the RoBERTa model can contain more fine-grained information of the Chinese corpus, and then the information is inputted into the BiLSTM model to deal with the danmaku text's contextual information for sentiment classification.The model structure is shown in Fig. 3.
By increasing the randomness and diversity of the pre-training data, RoBERTa can better learn the deep semantic information of the text and improve the accuracy of the downstream text categorization task.RoB-ERTa model is a bidirectional Transformer encoder based on the Bidirectional Encoder Representations from The interaction between the viewer and the producer of the video is similar to the interactive behavior of "liking", when the producer of the video asks the viewer to like the video, the viewer expresses approval of the author, or teases or mocks the poor quality of the video 4 My youth is making a comeback Abbreviation: youth is back, used to express surprise and admiration when a remembered thing or person returns in a different state of appearance

have no martial ethics
Internet buzzwords: indicate that the behavior of the characters in the video or the content of the video caught off guard, completely unexpected, mostly used to tease the video producer or the commercials inserted in the video mockery, you need to combine with the content of the video to make judgments  Transformers (BERT) model, which mainly utilizes Transformer-Encoder for computation.Each Encode module is composed of three parts: multi-head attention mechanism, residual connection and layer normalization, and feed-forward neural network, as shown in Fig. 4: In Fig. 4, the word vectors are obtained by transforming the words in the input text through the one-hot encoding representation, and the positional encoding indicates the relative or absolute position of the word in the sequence, and the word embedding vectors generated by superposition of the two are used as the inputs X.The multi-head attentionmechanism, as a self-attention mechanism, is the core unit in the Transformer encoder, which uses multiple independent Attention modules to perform concurrent operations on the input information, and its operational formula is shown in ( 6): where {Q, K, V } is the input matrix and d k is used as the input matrix dimension.The multi-head attention mecha- nism delivers the resulting hidden vector twice to the next layerafter the multi-head self-attention computation: residual connectionand layer normalization.The layer normalization transforms the input into mean-variance and the residual connection adds the input X with the result obtained from the nonlinear transformation as the output term.The inputs are then operated on by the two fully connected layers of the feedforward neural network, applying the formula shown in (7): where {W e , W 0 ′} is the weight matrix of the two connected layers and {b e , b 0 ′} is the bias term of the two con- nected layers.After each word embedding vector in the input layer is encoded by the RoBERTa layer encoding operation, a bidirectional correlation between word embedding vectors can be established, which enables the model to learn the semantic features contained in each word embedding vector in different contexts.For example, "这真的蚌埠住了" or "太感人了蚌埠住了", in which the word "蚌埠" expresses very different semantics in different contexts, RoBERTa pre-training model can be based on large-scale text pre-training to derive a ( 6) Before inputting to the BiLSTM layer, the word embedding vectors output from RoBERTa need to be inputted to the Feature Fusion Layer for processing, and the corresponding word vectors of the words in the lexicon are fused with the features, so that the word embedding vectors output from the RoBERTa model can contain more fine-grained Chinese corpus information.In the feature fusion layer, the jieba thesaurus is first used to segment the text, for example, in the sentence "This is really Bengbu lived", the jieba segmentation tool divides this sentence into ['this' , 'really' , 'Bengbu' , 'lived' , 'had'].In this paper, the number of words contained in each word in this sentence is counted to get the vector of [1,1,1,2,2].When the word embedding vector output by RoBERTa is obtained, this paper averages the words in the same word and fills them into the original position, thus realizing the purpose of feature fusion, the logical structure is shown in Fig. 5.
In a unidirectional LSTM, neuron states are propagated from the front to the back, so the model can only take into account past information, but not future information 39 , which results in LSTM not being able to perform complex sentiment analysis tasks well.For example, in the case of the danmaku "专家说的挺好的,下次别说 了", the literal message above expresses positive appreciation, but it can only be judged in the context of the semantics of the following sentence that it is the danmaku sender's derision and flirtation, and that the danmaku does not express a positive emotion.To solve this situation it is necessary to introduce a bidirectional LSTM.The BiLSTM model of the Bi-Long Short-Term Memory Network BiLSTM is composed of a forward-processing sequence LSTM with a reverse-processing sequence LSTM as shown in Fig. 6.This paper establishes a BiLSTM layer after the RoBERTa layer, and utilizes BiLSTM to extract features from the contextual information of the input texts, which effectively makes up for the shortcomings of the RoBERTa layer that lacks the consideration of contextual information.where n is the dimension of the feature vector obtained after pre-training in the sentence; a i ∈ R d a ; b a is bias vector and the dimension is d a ; Bidirectional LSTM is computed on the hidden layers in two different directions, and the hidden vectors − → h , ← − h of the last layer of the forward and backward LSTMare merged and used as the output, the output vector V i at moment i.The computational formula is shown in (9): The output is passed through a fully connected layer, and the Tanh function is used as the activation function g 2 to add nonlinear factors for hidden layer computation, where the computational metrics are shown in (10): where W d h ∈ R d a ×d h is the weight matrix of a corresponding to the index of the dth; U is the weight matrix of the output b of the corresponding i-1 moment of the hidden layer; d ∈ {0, 1} denotes the different directions in the hidden layer; and b d b ∈ R d h is the bias vector corresponding to the index of the dth.Afterwards, all h d 's in the hidden layer are combined to form the final sentence-level feature vector H .The feature vector H is fed into the fully connected layer and the ReLU activation function is used.The output of the fully connected layer ( 8)

Parameter settings
To evaluate the performance of the method proposed in this paper on the danmaku sentiment analysis task, experiments were conducted on NVIDIA GeForce RTX3060 using Python 3.

Experimental results and analysis
The results of the comparison experiment are shown in Table 3.
In order to visually compare the performance of each comparative model, this paper, based on Table 3, draws Fig. 7 (performance statistics of mainstream baseline model for sentiment analysis), Fig. 8 (performance statistics of mainstream baseline model with the introduction of the jieba lexicon and the FF layer), Fig. 9 (performance statistics of mainstream baseline model with the introduction of the MIBE-based lexicon and the FF layer), and Fig. 10 (comprehensive statistics of the performance of the sentiment analysis model), respectively.
Based on Fig. 7, it can be found that among the mainstream baseline models, the RoBERTa-BiLSTM model has the best performance, with accuracy, recall and F1 value exceeding 93.85%, which indicates that the RoBERTa pre-trained model is able to adequately extract the semantic and structural information of the danmaku text, and then using the bidirectional sequence modeling capability of the BiLSTM model, it can effectively capture the contextual information of danmaku text as well as fitting the textual characteristics of danmaku text with varying length and linguistic diversity, better understanding and modeling the dependency relationships in danmaku text, and improving the model's ability to classify emotional tendencies.The BERT-BiLSTM model performs slightly lower than the RoBERTa-BiLSTM model on this task, probably because the RoBERTa model uses a larger dataset, larger batch size, higher learning rate, and better masking strategy compared to the BERT model during the pre-training process, which allows it to show more robustness and higher optimization level.XLNET-BiLSTM has the worst performance on this task, probably because XLNET adopts an improved autoregressive  training approach, which may lead to its performance on sentiment classification is not as good as that of the BERT and RoBERTa models that adopt an autocoding training approach; and chinese-roberta-wwm-ext and chinese-bert-wwm-ext are models specifically pre-trained for use on Chinese text, their lexical and grammatical comprehension is more powerful on Chinese text, which is more suitable for the Chinese sentiment classification task.RoBERTa-TextCNN model achieved a good performance on this task table, but there is a gap with BERT-BiLSTM and RoBERTa-BiLSTM model, probably because the TextCNN model is to use convolution kernel sliding on the text sequence to capture the local information, and get the global information through the pooling layer, and its ability to capture the global long-distance dependence is relatively weak, and it can not effectively deal with the characteristics of the danmaku text of varying lengths and linguistic diversity, and it may be the problem of information loss or noise interference.RoBERTa-RNN and RoBERTa-LSTM models perform slightly worse on this task, probably because RNN has a weak ability to capture the information before and after the danmaku text sequence and is prone to gradient vanishing and gradient explosion problems; LSTM model has a lower information utilization rate compared to BiLSTM model, which relies on unidirectional information transfer only, and some important contextual information is easy to be ignored, and the ability to capture bidirectional dependencies of the danmaku text is slightly weak.The neural network and machine learning methods without using pre-trained models performed the worst, with the overall performance far lower than the methods using pre-trained models.Among them, the SVM model performed relatively well, with the accuracy, recall and F1 values all exceeding 88.50%.The model had a strong generalization ability in dealing with binary classification problems, but it focused on the selection and representation of features.The semantic features of danmaku texts were complex, which might exceed the model's processing ability.The BiLSTM model performed second, and only learned simple temporal information without the support of pre-trained models.It was difficult to learn the deep and rich linguistic knowledge of danmaku texts.The BernoulliNB model performed the worst, as it required binarization of the data, which resulted in some information loss and affected the quality and integrity of the data.Based on Fig. 8, it can be found that after the introduction of jieba word-splitting lexicon and embedding FF feature fusion layer in the mainstream baseline model, the performance of each model is improved, especially the F1 values of XLNET-BiLSTM, RoBERTa-LSTM, RoBERTa-RNN, and RoBERTa-TextCNN are all increased by more than 0.1%.It indicates that the introduction of jieba lexicon can cut Chinese danmaku text into more reasonable words, reduce noise and ambiguity, and improve the quality of word embedding.The FF feature fusion layer is able to average the feature encoding of the words in each word to get the word encoding that retains the semantic information of the word, which makes Chinese word splitting meaningful, and is also helpful to eliminate the effect of multiple meanings of words, express more accurate information, and enhance the semantic comprehension and generalization ability of the model.
Based on Fig. 9, it can be found that after adding MIBE neologism recognition to the model in Fig. 7, the performance of each model is improved, especially the accuracy and F1 value of RoBERTa-FF-BiLSTM,

Discussions
This paper proposes to classify the sentiment of danmaku texts based on the MIBE-RoBERTa-FF-BiLSTM model, and adopts the neologism recognition algorithm based on mutual information (MI) and branchentropy (BE), which can effectively discover the non-regular popular neologisms with more than three elements in danmaku text, and introduce the jieba lexical lexicon, which can slice the text into more reasonable words, such as "爷青 回", "蚌埠住了", etc.It reduces noise and ambiguity, improves the quality of word embeddings, is more efficient and adaptable to the characteristics and variations of danmaku texts compared to traditional lexicons, and is able to capture fresh and interesting expressions in the text, which solves RQ2.Based on Maslow's hierarchy of needs theory, this paper argues that danmaku text emotion is jointly generated by individual needs and external stimuli.By parsing the hierarchy of needs in danmaku and combining it with the video content, it improves the www.nature.com/scientificreports/reasonableness and consistency of the emotion annotation, which is more reflective of the psychological state and motivation of the danmaku users than the traditional annotation, reduces the difficulty of the annotation, and solves the RQ1.This paper uses RoBERTa-FF-BiLSTM model to learn the semantic features of danmaku texts, and the pre-trained model based on RoBERTa is able to fully extract the semantic and structural information of danmaku text, learn deeper and richer linguistic knowledge than traditional machine learning or shallow neural network methods, improve the model's ability to generalize on small-scale data, and solve the RQ3; Based on the feature fusion layer the feature encoding of the words in each word after word splitting is averaged and then filled into the position of the original word, compared with the traditional word encoding method, it is able to obtain the word encoding that retains the semantic information of the word, eliminates the effect of multiple meanings of the word, and solves the RQ4; Based on the BiLSTM model to capture the contextual information of danmaku text, it efficiently fits the textual characteristics of danmaku text of varying lengths and linguistic diversity, and is able to adapt to danmaku text of different lengths and styles compared to traditional convolutional neural network or recurrent neural network approaches.However, the study of the text has the following limitations and challenges: (1) The danmaku text contains a large number of popular new words on the Internet, such as self-made words, abbreviations, interactive words, etc.These new words increase the difficulty of semantic understanding and emotional expression of the danmaku text.This paper uses a danmaku new word recognition algorithm based on MIBE, which can automatically discover irregular popular words in the danmaku text, but still needs manual semantic understanding and evaluation in combination with the context and video content, judging the quality of the new words, eliminating invalid words, and the degree of automation is limited.
The manual review process is not only time-consuming and labor-intensive, but also may have subjective bias and inconsistency, affecting the quality and reliability of the new word dictionary.(2) This paper only considers the danmaku text content, ignoring the semantic information of the danmaku video content, and the understanding and evaluation of the danmaku's emotional expression is relatively single.The danmaku text and video content are interrelated and influenced by each other.The emotional tendency and expression of the danmaku users are often stimulated and guided by the video content.Simply analyzing the danmaku text content may ignore some important emotional information and contextual information, resulting in the sentiment analysis results not being accurate and comprehensive enough.(3) In the comparative experiment, the training paradigm of "pre-trained model + neural network classifier" was used.Although it achieved high performance, there were also some problems.The recall rate of the model under this paradigm was higher than the accuracy and F1 values, indicating that the model's prediction results and true labels had deviations, and the model was more inclined to predict negative cases as positive cases, resulting in positive case preference problems.
In the future, the following aspects can be considered for optimization and improvement: (1) We will perform topic identification on the danmaku new words identified by the MIBE algorithm, design corresponding prompts, and use large language models such as GTP4, Llama2, ChatGLM3, etc. to perform semantic analysis and quality evaluation on the danmaku new words more logically and efficiently, filter out meaningful and useful new words, automatically construct the danmaku new word dictionary, and reduce the workload and error of manual review.(2) We will use multimodal representation methods such as CLIP to extract and fuse features of the danmaku text and video content, capture the interactive emotional information between the danmaku text and video content, and understand the danmaku's emotional expression more comprehensively and accurately.(3) We will design a text quality evaluation and error correction model of "pre-trained model (T5, MacBERT, etc.) + external knowledge base", try to evaluate and score the danmaku texts with positive and negative emotions, clean, standardize, correct, etc. the texts with low quality, improve the quality of the texts, make the texts with positive and negative emotions more close in quality, thereby reducing the model's positive case preference, and improve the model's prediction performance and generalization ability.

Conclusion
This paper presents a video danmaku sentiment analysis method based on MIBE-RoBERTa-FF-BiLSTM.It employs Maslow's Hierarchy of Needs theory to enhance sentiment annotation consistency, effectively identifies non-standard web-popular neologisms in danmaku text, and extracts semantic and structural information comprehensively.By learning word, character, and context information, the model better understands and models semantic and dependency relationships in danmaku text.It outperforms mainstream models in video danmaku sentiment classification.This research method offers a novel perspective on video danmaku sentiment analysis, serving as a valuable reference for related fields.

Figure 5 .
Figure 5. Logical structure diagram of Feature Fusion Layer.
8 and PyTorch framework.Chinese-RoBerta-WWM-EXT, Chinese-BERT-WWM-EXT and XLNet are used as pre-trained models with dropout rate of 0.1, hidden size of 768, number of hidden layers of 12, max Length of 80. BiLSTM model is used for sentiment text classification with dropout rate of 0.5, hidden size of 64, batch size of 64, and epoch of 20.The model is trained using Adam optimizer with a learning rate of 1e−5 and weight decay of 0.01.

Figure 7 .
Figure 7. Performance statistics of mainstream baseline model for sentiment analysis.

Figure 8 .
Figure 8. Performance statistics of mainstream baseline model with the introduction of the jieba lexicon and the FF layer.

Figure 9 .
Figure 9. Performance statistics of mainstream baseline model with the introduction of the MIBE-based lexicon and the FF layer.

Figure 10 .
Figure 10.Comprehensive statistics of the performance of the sentiment analysis model, respectively.
These Internet buzzwords contain rich semantic and emotional information, but are difficult to be recognized by general-purpose lexical tools.danmaku domain lexicon can effectively solve this problem by automatically

Table 1 .
Danmaku neologisms meaning.: used as a reminder to the rest of the audience that what's coming up next is very exciting, so be prepared 2 I can't laugh anymore Self-wording: indicate that the content of the video is so hilarious that you can't help but let out a chuckle 3 I'll definitely do it next time

Table 2 .
Examples of Danmaku emotion annotation based on Maslow's hierarchy of needs.

Table 3 .
Performance statistics of the sentiment analysis models.